Scaling up classification rule induction through parallel processing

نویسندگان

  • Frederic T. Stahl
  • Max Bramer
چکیده

The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Rule Induction with Information Theoretic Pre-Pruning

In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work ...

متن کامل

P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction

Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing ...

متن کامل

J-PMCRI: A Methodology for Inducing Pre-pruned Modular Classification Rules

Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top ...

متن کامل

Combining Machine Learning and Active Objects for Parallel Data Mining

Nowadays, the necessity and usefulness of the field of Data Mining (DM) and Knowledge Discovery in Databases (KDD) are largely established by both the scientific and industrial communities; and a number of real applications have already been developed in domains ranging from space data to financial analysis [1]. However, the need for scaling up DM algorithms is a natural requirement of the more...

متن کامل

PMCRI: A Parallel Modular Classification Rule Induction Framework

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Knowledge Eng. Review

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2013